multiple choice learning
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- North America > United States > Virginia (0.05)
- North America > United States > Indiana (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Government (0.68)
- Education (0.66)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.05)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > California (0.04)
- Information Technology > Artificial Intelligence > Vision (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Multiple Choice Learning of Low Rank Adapters for Language Modeling
Letzelter, Victor, Malard, Hugo, Fontaine, Mathieu, Richard, Gaël, Essid, Slim, Bursuc, Andrei, Pérez, Patrick
We propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time. Traditional language modeling is an intrinsically ill-posed problem: given a context, multiple futures may be equally plausible. Our approach leverages Multiple Choice Learning (MCL) and the Winner-Takes-All (WTA) loss to efficiently handle ambiguity through Low-Rank Adaptation (LoRA). We provide a theoretical interpretation of applying Multiple Choice Learning to Language Modeling, assuming the data is generated from a mixture of distributions. To illustrate the proposed approach, we use data sampled from mixtures of Markov chains. We then demonstrate with extensive experiments on real-world visual and audio captioning tasks that our method achieves high diversity and relevance in generated outputs.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Multiple Choice Learning for Efficient Speech Separation with Many Speakers
Perera, David, Derrida, François, Mariotte, Théo, Richard, Gaël, Essid, Slim
Training speech separation models in the supervised setting raises a permutation problem: finding the best assignation between the model predictions and the ground truth separated signals. This inherently ambiguous task is customarily solved using Permutation Invariant Training (PIT). In this article, we instead consider using the Multiple Choice Learning (MCL) framework, which was originally introduced to tackle ambiguous tasks. We demonstrate experimentally on the popular WSJ0-mix and LibriMix benchmarks that MCL matches the performances of PIT, while being computationally advantageous. This opens the door to a promising research direction, as MCL can be naturally extended to handle a variable number of speakers, or to tackle speech separation in the unsupervised setting.
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing
Perera, David, Letzelter, Victor, Mariotte, Théo, Cortés, Adrien, Chen, Mickael, Essid, Slim, Richard, Gaël
We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL. MCL is a learning framework handling ambiguous tasks by predicting a small set of plausible hypotheses. These hypotheses are trained using the Winner-takes-all (WTA) scheme, which promotes the diversity of the predictions. However, this scheme may converge toward an arbitrarily suboptimal local minimum, due to the greedy nature of WTA. We overcome this limitation using annealing, which enhances the exploration of the hypothesis space during training. We leverage insights from statistical physics and information theory to provide a detailed description of the model training trajectory. Additionally, we validate our algorithm by extensive experiments on synthetic datasets, on the standard UCI benchmark, and on speech separation.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Speech (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Multiple Choice Learning: Learning to Produce Multiple Structured Outputs
We address the problem of generating multiple hypotheses for structured prediction tasks that involve interaction with users or successive components in a cascaded architecture. Given a set of multiple hypotheses, such components/users typically have the ability to retrieve the best (or approximately the best) solution in this set. The standard approach for handling such a scenario is to first learn a single-output model and then produce M-Best Maximum a Posteriori (MAP) hypotheses from this model. In contrast, we learn to produce multiple outputs by formulating this task as a multiple-output structured-output prediction problem with a loss-function that effectively captures the setup of the problem. We present a max-margin formulation that minimizes an upper-bound on this lossfunction. Experimental results on image segmentation and protein side-chain prediction show that our method outperforms conventional approaches used for this type of scenario and leads to substantial improvements in prediction accuracy.
- North America > United States > Illinois (0.04)
- North America > United States > Virginia (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)